1 research outputs found
On the use of semantic awareness to limit overfitting in genetic programming
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsMachine learning and statistics provide powerful tools to solving problems of
many different shapes. But with the algorithms searching for approximations the
problem of overfitting remains present. Genetic Programming describes an algorithmic
approach that is likely to produce overfitting solutions. Thus, in order
to lessen the risk of overfitting and increasing the generalization ability of genetic
programming the use of semantic information is assessed in different ways.
A multi-objective system driving the population away from overfitting solutions
based on semantic distance is presented alongside alternatives and extensions.
The extensions include the use of the semantic signature to increase the amount
of information available to the system, as well as the consideration to replace the
validation dataset. It is on the one hand concluded that the described approaches
and none of the extensions have a positive impact on the generalization ability.
But on the other hand it seems that the semantics do contain enough information
to appropriately discriminate between overfitting and not overfitting individuals